Bioinformatics (Thomas Dandekar, Meik Kunz)

278

provides a wealth of information. It is important to remember that these are only func

tional annotations of elements. Some of the elements have only weak or no selection pres

sure. For a comparison between vertebrates including humans, the UCSC genome browser

is recommended (https://genome.ucsc.edu), which meanwhile compares a whole zoo of

different genomes with each other (https://genome-euro.ucsc.edu/cgi-bin/hgGateway), but

also includes information e.g. from the ENCODE project, such as methylation data, or

predictions by RepeatMasker, such as LINE.

How Can I Create a Phylogenetic Family Tree?

Phylogenetic trees provide an overview of functional and evolutionary relationships. A

number of software options have been described in the book for this purpose. It is impor

tant that even a simple program like CLUSTAL (https://www.ebi.ac.uk/Tools/msa/clust

alo/ [newest version: CLUSTAL omega]; https://www.genome.jp/tools/clustalw/

[somewhat older version, aligns pairwise sequences over their whole length quite fast and

draws a phylogenetic tree]) with experience brings better results (with CLUSTAL it is

important to take sequences of approximately the same length; in addition, depending on

the presumed evolutionary distance, one can correct with matrices here). The more com

plex softwares are correspondingly more complex to use. An example for accurate phylo

genetic tree analysis is the PHYLogeny Inference Package (PHYLIP; https://evolution.

genetics.washington.edu/phylip.html), which allows the construction of phylogenetic

trees from sequences based on various methods, such as parsimony, likelihood, and boot

strapping (see the website for detailed documentation). Another option is the software

MUSCLE (Multiple Sequence Comparison by Log-Expectation; https://www.drive5.com/

muscle/), which, in addition to multiple alignment, computes a phylogenetic tree based,

for example, on the methods UPGMA (Unweighted Pair Group Method with Arithmetic

Mean; fast method if there are many sequences) or Neighbor joining (better approximation

to the true tree, but slow if there are too many sequences). The results from MUSCLE can

also be saved in a format compatible with PHYLIP (Newick) and used there. Detailed

documentation on MUSCLE can be found on MUSCLE (https://www.drive5.com/muscle/

manual/) or on the EBI website (https://www.ebi.ac.uk/Tools/msa/muscle/help/).

19.2

RNA: Sequence, Structure Analysis and Control

of Gene Expression

How Do I Find and Analyze an RNA Sequence and Structure?

During transcription, an RNA is produced that has a secondary structure. One important

database is Rfam. It is easy to look up and use and gives an overview of different RNA

families including sequence and structure. There are different functional RNA classes,

such as miRNAs and lncRNAs, which have an impact on gene expression. Important data

bases include miRBase (https://www.mirbase.org/) and LNCipedia (https://www.lncipe

dia.org/), which provide specific information on sequence, structure and functional

19 Tutorial: An Overview of Important Databases and Programs